Poincare Maps projection on Thioredoxins starting from scratch

Data preparation

The user must provide a path to the input file in .mfasta format and path to the output directory for intermediate file storage:

All scripts necessary for data preparation are located in scirpts/data_preparation:

Data preparation consists in .mfasta cleaning according to a gap threshold and translation of each sequence to the PSSM profile:

Data projection using Poincaré disk

You can change the parameters of the projection here:

Then, the following command creates a projection of encoded sequences to a Poincaré disk:

Parameters by default are provided in comments. The output files are then written to the following file:

Projection visualization

One can visualieze the resulting projection using any convenient coloring. To do so, the user shoud provide a .csv file with each line corresponding to a protein:

A user can also create a custom color palette:

Here follow several examples of kinase family visualization.

Thioredoxins by kingdom